require(knitr)
require(kableExtra)
require(data.table)
require(ggplot2)

1 Sexual relationships data

Participant are required to provide the number of partnerships in the past 12 months, and provide at most four partners’ age information. Participants are from fishing and inland communities and R015 round.

Total 16327 participants are involved: 3705 are from fishing (1781 Female and 1924 Male); 12622 are from inland (6951 Female and 5671 Male);

1.1 Reports visualisation

In the survey, there are 33 (12 from inland and 21 from fishing) participants reported ‘>3’ instead of the number.

After cleaning the character and big number reports, we visualise the rest reports.

There are one Female people from inland aged 30 reported 69 reports. And 2938 participants reported 0 contacts.

We plot the number of reported different partners in the last 12 months vs the number of participants by community, gender and age group: 15-24, 25-34, 34-49.

We then clipped the plot to 15 reported partners

1.2 Missing partner information

Participants are required to provide at most four detailed information of their partners. In this setting, we won’t have partners’ age for each partnership report. For example, people with 3 reported partners may probability just provided one partner’s age. Then we missed 2 partner information.

We plot the age distribution by gender and community for 0 reports, no missing details and missing details participants.

The proportion of participants with missing detailed reports by gender and community:

1.3 Data vis with the detailed age information

Then we just show the reported number of partners with details in the panels:

2 Method

From the above plots, we find that there are a lot reported without the detailed age information on partners. To fill the missing information, we need to apply a \(\rho\) term offset to adjust the reports.

We have

  • the number of reported partnerships from participants aged \(a\): \(Y_{a}\).
  • the number of reported partnerships with age detailed information \(b\) on partners from participants aged \(a\): \(Z_{a,b}\).

2.1 Basics

We consider a survey that records for each participant \(i\) the number of daily contacts to the group of individuals in age band \(b\) in the population, and denote this by \(Z_{ib}\). We assume there is no reporting bias, and so for a participant \(i\) of age \(a\), \[ \mathbb{E}(Z_{ib}) = m_{ab}, \] where \(m_{ab}\) is the expected number of contacts from ONE person of age \(a\) to the age group \(b\), also known as the CONTACT INTENSITY.

One common confusion is that \(m_{ab}\) is NOT symmmetric in \((a,b)\) because the age composition of the population is NOT constant. Let us denote the number of individuals of age \(a\) in the POPULATION by \(P_a\). We only have \[ N_a m_{ab} = P_b m_{ba}. \] The above equality motivates defining the CONTACT RATE \[ c_{ab} = m_{ab}/P_b \] and we have \(c_{ab}=c_{ba}\). This symmetry makes us realise that we only need to learn half of all parameters in the field \(\{m_{ab}, a\in\mathcal{A}, b\in\mathcal{A}\}\).

We assume that all individuals of age \(a\) are equally likely to be sampled, and that all individuals of age \(a\) have the same contact intensity \(m_{ab}\). Denote the number of participants of age \(a\) in the SAMPLE by \(N_a\). Then the van Kassteele model considers

\[ Z_{ab} = \sum_{i=1}^{N_a} Z_{ib} \] and models this response variable with \[\begin{align*} & Z_{ab} \sim \text{NegativeBinomial}( \mu_{ab}, \phi )\\ & \log \mu_{ab} = \beta + \chi_{ab} + \log(N_a) + \log(P_b);\\ & Z_{ab} \sim \text{Poisson}( \mu_{ab})\\ & \log \mu_{ab} = \beta + \chi_{ab} + \log(N_a) + \log(P_b). \end{align*}\] Here, \(\mu_{ab}\) is the expected sum of contact intensities from participants of age \(a\), and \(\beta\) is a random effect, and \(\chi_{ab}\) is a random effect that is given a smoothing prior of choice. Since \[ \mu_{ab} / ( N_a P_b) = \exp( \beta + \chi_{ab} ), \] we see that the contact rate \(c_{ab}\) is given by \(\exp( \beta + \chi_{ab} )\). Thus, we can learn the upper diagonal field \(\{ c_{ab}, a\in\mathcal{A}, b\in\mathcal{A}, b\geq a\}\) and then calculate as generated quantities the all so desired contact intensities \(\{ c_{ab}, a\in\mathcal{A}, b\in\mathcal{A}\}\).

2.2 Adjusted partnership reports

  • the number of contacts from participants of age \(a\) by \(Y_a\)
  • the number of contacts from participants of age \(a\) that are reported in detail by \(Z_a\)
  • the sum of contacts from participants of age \(a\) to age group \(b\) that are reported in detail by \(Z_{ab}\)

We assume that reporting is independent of how many contacts individuals have, and that all individuals of age \(a\) have the same reporting probability. We then have \[ \mathbb{E}( Z_{ab} ) = \mathbb{E}( \frac{Z_a}{Y_a} Y_{ab} ) = \mathbb{E}( \frac{Z_a}{Y_a} ) \: \mu_{ab}, \] where \(\rho_a = Z_a/Y_a\) is an additional offset to adjust the reports.

This prompted us to extend the van Kassteele model to \[\begin{align*} & Z_{ab} \sim \text{NegativeBinomial}( \tilde\mu_{ab}, \phi )\\ & \log \tilde{\mu}_{ab} = \beta + \chi_{ab} + \log(N_a) + \log(N_b) + \log(\rho_a);\\ & Z_{ab} \sim \text{Poisson}( \tilde{\mu}_{ab})\\ & \log \tilde{\mu}_{ab} = \beta + \chi_{ab} + \log(N_a) + \log(N_b) + \log(\rho_a), \end{align*}\]

3 Results

We place the B-spline Gaussian process with knots 30 on \(\chi\). From the above figures, we find that the number of partners reported from Female may plausibly under-reported. Therefore, we use the reports from male and estimate the partnership rate for female.

The results from Poisson and Negative Binomial models are similar, so I present the figures from Poisson model in the rest of the report.

3.1 Age partnerships contact intensities distrbution

Male to Female in fishing community: In this figure, the black dash line is the rho adjusted empirical contact intensities, i.e. \(Z_{a,b}/N_{a}/\rho_a\)

In this plot, the black dash line is the empirical contact intensities are \(Z_{a,b}/N_a\)

Male to Female in inland community with the rho adjusted empirical contact intensities

Male to Female in inland community with the empirical contact intensities

Female to Male in fishing community with the empirical contact intensities

Female to Male in inland community with the empirical contact intensities

3.2 Contact intensities patterns

Patterns with the empirical contact intensities in fishing

Patterns with the empirical contact intensities in inland

Patterns comparison: fishing and inland

3.3 Age marginal distrbution

Fishing: black dash line is the empirical value; the purple dash line is the rho scaled empirical value

Inland:

Comparison between communities based on the marginal estimated contact intensities

3.4 Posterior predictive check

Contact intensities patterns fishing: 3.18% Male reported \(Z_{ab}\) outside 95% predictive intervals.

Contact intensities patterns inland: 4.57% Male reported \(Z_{ab}\) outside 95% predictive intervals.

Posterior marginal estimated contact reports fishing: 5.71% Male reported \(Z_a\) outside 95% predictive intervals

Posterior marginal estimated contact reports inland: 2.86% Male reported \(Z_a\) outside 95% predictive intervals